Search CORE

28 research outputs found

Speeding up RDF aggregate discovery through sampling

Author: Manolescu Ioana
Mazuran Mirjana
Publication venue: HAL CCSD
Publication date: 26/03/2019
Field of study

International audienceRDF graphs can be large and complex; finding out interesting information within them is challenging. One easy method for users to discover such graphs is to be shown interesting aggregates (un-der the form of two-dimensional graphs, i.e., bar charts), where interestingness is evaluated through statistics criteria. Dagger [5] pioneered this approach, however its is quite inefficient, in particular due to the need to evaluate numerous, expensive aggregation queries. In this work, we describe Dagger + , which builds upon Dagger and leverages sampling to speed up the evaluation of potentially interesting aggregates. We show that Dagger + achieves very significant execution time reductions, while reaching results very close to those of the original, less efficient system

INRIA a CCSD electronic archive server

HAL-Polytechnique

Semi-automatic support for evolving functional dependencies

Author: Mazuran Mirjana
Quintarelli Elisa
Tanca Letizia
Ugolini Stefania
Publication venue: Springer Verlag
Publication date: 01/01/2016
Field of study

During the life of a database, systematic and frequent violations of a given constraint may suggest that the represented reality is changing and thus the constraint should evolve with it. In this paper we propose a method and a tool to (i) find the functional dependencies that are violated by the current data, and (ii) support their evolution when it is necessary to update them. The method relies on the use of confidence, as a measure that is associated with each dependency and allows us to understand \u201dhow far\u201d the dependency is from correctly describing the current data; and of goodness, as a measure of balance between the data satisfying the antecedent of the dependency and those satisfying its consequent. Our method compares favorably with literature that approaches the same problem in a different way, and performs effectively and efficiently as shown by our tests on both real and synthetic databases

Archivio istituzionale della ricerca - Politecnico di Milano

Catalogo dei prodotti della ricerca

Learning from, Understanding, and Supporting DevOps Artifacts for Docker

Author: Agrawal Rakesh
Chi Yun
Guidotti Riccardo
Mazuran Mirjana
Xu T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/02/2020
Field of study

With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii) the lack of semantic rule-based analysis. To address these challenges we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub repositories. Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles, and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed challenge (i) by reducing the number of effectively uninterpretable nodes in our ASTs by over 80% via a technique we call phased parsing. To address challenge (ii), we introduced a novel rule-mining technique capable of recovering two-thirds of the rules in a benchmark we curated. Through this automated mining, we were able to recover 16 new rules that were not found during manual rule collection. To address challenge (iii), we manually collected a set of rules for Dockerfiles from commits to the files in the Gold Set. These rules encapsulate best practices, avoid docker build failures, and improve image size and build latency. We created an analyzer that used these rules, and found that, on average, Dockerfiles on GitHub violated the rules five times more frequently than the Dockerfiles in our Gold Set. We also found that industrial Dockerfiles fared no better than those sourced from GitHub. The learned rules and analyzer in binnacle can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.Comment: Published in ICSE'202

arXiv.org e-Print Archive

Crossref

Process conformance checking by relaxing data dependencies

Author: Estañol Lamarca Montserrat
Mazuran Mirjana
Oriol Hilari Xavier
Tanca Letizia
Teniente López Ernest
Publication venue: CEUR-WS.org
Publication date: 01/01/2017
Field of study

Given the events modeled by a business process, it may happen in the presence of alternative execution paths that the data required by a certain event determines somehow what event is executed next. Then, the process can be modeled by using an approximate functional dependency between the data required by both events. We apply this approach in the context of conformance checking: given a business process model with a functional dependency (FD) that no longer corresponds to the observed reality, we propose corrections to the FD to make it exact or at least to improve its confidence and produce a more accurate model.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A declarative extension of horn clauses, and its significance for datalog and its applications

Author: CARLO ZANIOLO
EDOARDO SERRA
Gelfond
Greco
Mazuran
Mazuran
MIRJANA MAZURAN
Zaniolo
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

Extraction, Sentiment Analysis and Visualization of Massive Public Messages

Author: Farina Jacopo
Mazuran Mirjana
Quintarelli Elisa
Publication venue
Publication date: 01/01/2014
Field of study

This paper describes the design and implementation of tools to extract, analyze and explore an arbitrarily great amount of public messages from diverse sources. The aim of our work is to flexibly support sentiment analysis by quickly adapting to different use cases, languages, and message sources. First, a highly parallel scraper has been implemented, allowing the user to customize the behavior with scripting technologies and thus being able to manage dynamically loaded content. Then, a novel framework is developed to support agile programming, building and validating a classifier for sentiment analysis. Finally, a web application allows the real-time selection and projection of the analysis results in different dimensions in an OLAP fashion

Catalogo dei prodotti della ricerca

Data Mining for XML Query-Answering Support

Author: Elisa Quintarelli
Letizia Tanca
Mirjana Mazuran
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A declarative extension of horn clauses, and its significance for datalog and its applications

Author: MAZURAN MIRJANA
Serra Edoardo
Zaniolo Carlo
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2013
Field of study

FS-rules provide a powerful monotonic extension for Horn clauses that supports monotonic aggregates in recursion by reasoning on the multiplicity of occurrences satisfying existential goals. The least fixpoint semantics, and its equivalent least model semantics, hold for logic programs with FS-rules; moreover, generalized notions of stratification and stable models are easily derived when negated goals are allowed. Finally, the generalization of techniques such as seminaive fixpoint and magic sets, make possible the efficient implementation of DatalogFS, i.e., Datalog with rules with Frequency Support (FS-rules) and stratified negation. A large number of applications that could not be supported efficiently, or could not be expressed at all in stratified Datalog can now be easily expressed and efficiently supported in DatalogFS and a powerful DatalogFS system is now being developed at UCLA. Copyright © 2013 [MIRJANA MAZURAN, EDOARDO SERRA and CARLO ZANIOLO]

Archivio istituzionale della ricerca - Politecnico di Milano